Critical Care Explorations
○ Ovid Technologies (Wolters Kluwer Health)
Preprints posted in the last 90 days, ranked by how well they match Critical Care Explorations's content profile, based on 15 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Basilakis, A.; Duenser, M. W.
Show abstract
Background: The Therapeutic Distance framework (Paper 1) achieved AUC 0.61 for orbit-based mortality prediction in 11,627 sepsis patients. We hypothesised that incorporating state-dependent parameter relevance would substantially improve prediction. Methods: We extended the framework to 84,176 ICU patients from MIMIC-IV v3.1 across 16 clinical syndromes. Validation included full-population leave-one-out (n=59,362), head-to-head comparison against SAPS-II and logistic regression on 34,467 matched patients with bootstrap confidence intervals, temporal validation, outcome permutation, sensitivity analysis, and calibration assessment. Results: Full-population leave-one-out achieved AUC 0.832 (n=59,362). On 34,467 matched patients, Therapeutic Distance (AUC 0.841) significantly outperformed both SAPS-II (0.786; delta=+0.055, 95% CI +0.048 to +0.061, p<0.001) and logistic regression (0.788). Temporal validation showed stable performance (delta=-0.006). Outcome permutation confirmed genuine signal (AUC 0.859 to 0.498 with shuffled mortality). Sensitivity analysis demonstrated near-zero variation (delta 0.0006-0.003). The framework performed well for 8 of 16 syndromes (AUC >0.70) and failed for DKA and post-cardiac surgery (AUC <0.40). Conclusions: Therapeutic Distance provides therapy-specific risk stratification that exceeds both established severity scores and standard machine learning while remaining robust to hyperparameter choices, temporal drift, and outcome permutation.
Wanka, S.-T.; Zilberszac, R.; Hermann, A.; Lenz, M.; Hengstenberg, C.; Schellongowski, P.; Staudinger, T.
Show abstract
BackgroundEarly lactate is widely used to risk-stratify septic shock, yet clinically actionable cut-offs for 28-day mortality remain uncertain. MethodsIn a single-centre study conducted across two intensive care units, we analysed 84 adults with septic shock identified within 24 hours of intensive care unit admission. The primary endpoint was 28-day mortality. Four lactate metrics obtained during the first 24 hours were evaluated: first (admission) lactate, last lactate, peak lactate, and lactate clearance from first to last. Associations were tested using logistic regression with and without adjustment for the Simplified Acute Physiology Score 3; discrimination was assessed by area under the receiver-operating characteristic curve (AUROC), and optimal cut-offs were defined by the Youden index. ResultsThirty-nine of 84 patients (46.4%) died by day 28. Higher absolute lactate values were independently associated with death (adjusted odds ratio (OR) per 1 mmol/L increase: First 1.47, p<0.001; Last 1.41, p=0.002; Peak 1.39, p<0.001), whereas Lactate clearance was not (OR 0.65, p=0.202). Discrimination was moderate to good for peak (AUROC 0.817), first (0.791), and last (0.757) lactate, and poor for clearance (0.577). Youden-derived thresholds provided pragmatic trade-offs: First 3.55 mmol/L (sensitivity 0.821, specificity 0.689), Last 3.15 mmol/L (0.567, 0.864), and Peak 3.55 mmol/L (0.973, 0.556). Kaplan-Meier curves using these cut-offs showed early and sustained separation. ConclusionsIn adults with septic shock, simple early lactate thresholds around 3.3- 3.6 mmol/L (first/peak) and approximately 3.15 mmol/L (last) identify 28-day mortality risk and outperform lactate clearance.
Kunche, N.
Show abstract
Background: Severity scoring systems such as SOFA, NEWS2, and qSOFA effectively identify deteriorating ICU patients by aggregating physiological parameters into composite indices that trigger clinical alerts. However, these systems evaluate patient state at discrete time points and do not model the temporal dynamics of organ deterioration or the pharmacokinetic constraints that govern whether a given intervention can achieve therapeutic effect before an organ trajectory crosses an irreversible threshold. This limitation is consequential because interventions across critical care span pharmacokinetic onset times from seconds (vasopressors) to hours (metabolic corrections, blood products, enzymatic cofactors), yet no existing framework quantifies timing adequacy as a function of these intervention-specific pharmacokinetic properties. Methods: We developed the Multi-Organ Intervention State Space (MOISS), a collision geometry framework that classifies intervention timing adequacy by computing the temporal relationship between the predicted time for a biomarker trajectory to reach a critical threshold and the time required for the administered intervention to achieve peak therapeutic effect. Biomarker trajectories were estimated using the Kunche Adaptive Estimator (KAE), a reliability-adaptive Kalman filter that provides continuous position and velocity estimates from intermittent laboratory measurements. MOISS assigns each intervention event to one of six ordinal categories: PROPHYLACTIC, ON_TIME, PARTIAL, MARGINAL, FUTILE, or TOO_LATE. We applied this framework to 301,470 ICU patients across three databases (eICU-CRD, MIMIC-IV, MIMIC-III), evaluating 65 distinct intervention-organ pairs spanning 10 organ systems: Cardiovascular, Metabolic, Respiratory, Renal, Hematologic, Hepatic, Gastrointestinal, Infection, Endocrine, and Neurological. Results: Timing-mortality associations were identified across all 10 organ systems, with 87 intervention-database combinations achieving statistical significance (p<0.05). The highest timing sensitivity was observed in metabolic corrections: thiamine supplementation for metabolic acidosis (OR 5.76; 95% CI 4.86-6.83 in MIMIC-IV), sodium bicarbonate (OR 4.99; 95% CI 4.27-5.82 in MIMIC-IV). Respiratory interventions showed comparable magnitude: mechanical ventilation initiation (OR 5.03; 95% CI 4.42-5.73 in MIMIC-IV). Hematologic interventions demonstrated strong timing dependency: platelet transfusion (OR 4.25; 95% CI 3.68-4.90), fresh frozen plasma (OR 3.41; 95% CI 2.94-3.95). Cardiovascular agents ranged from OR 1.40 for norepinephrine (consistent with its rapid 1-2 minute onset providing a forgiving therapeutic window) to OR 2.23 for milrinone. Infection-directed therapies, hepatic support, renal replacement, endocrine correction, gastrointestinal interventions, and neurological agents all contained timing-sensitive members. Cross-database consistency was demonstrated for 29 of 52 testable interventions (55.8%), with 6 interventions achieving significance across all three databases. Conclusions: Intervention timing sensitivity is pervasive across the entire spectrum of critical care therapeutics, spanning all 10 organ systems and all pharmacokinetic classes evaluated. MOISS provides a systematic framework for quantifying this timing adequacy that complements existing severity scoring by adding the pharmacokinetic timing dimension: where SOFA, NEWS2, and qSOFA identify that a patient is deteriorating, MOISS computes whether the specific planned intervention can still achieve its intended effect given the current organ trajectory and pharmacokinetic constraints. The universality of timing sensitivity across organ systems argues for multi-organ trajectory monitoring as the foundation for next-generation clinical decision support.
Krishnan, P.; Sikora, A.; Murray, B.; Ali, A.; Podgoreanu, M.; Upadhyaya, P.; Gent, A.; CHOUDHARY, T.; Holder, A. L.; Esper, A.; Kamaleswaran, R.
Show abstract
RationaleAutonomic dysfunction is a hallmark of sepsis pathophysiology, yet its quantification remains challenging. Multiscale entropy (MSE) derived from heart rate variability (HRV) offers a dynamic measure of physiological complexity and may serve as a biomarker of early deterioration associated with subsequent organ failure, vasopressor escalation, or mortality. ObjectiveTo determine whether MSE computed across multiple temporal scales during the first 24 hours of Intensive Care Unit (ICU) admission is associated with short-term mortality and longer-term organ dysfunction in patients with sepsis, and whether these relationships vary across vasopressor exposure. Unlike prior studies that focused on short-term HRV metrics, we applied MSE across multiple temporal scales and incorporated these features into machine learning models to evaluate their prognostic utility in septic shock. MethodsThis retrospective cohort study included adult ICU sepsis patients at Emory University Hospital from January 2016 to December 2019. Of 2,076 eligible patients, 958 were propensity matched into two cohorts: fluids-only and fluids-plus-vasopressor, with norepinephrine as the primary vasopressor. High-resolution electrocardiogram (ECG) waveforms were analyzed to compute MSE across 20 temporal scales. Machine learning models using (1) MSE features alone and (2) MSE combined with demographic and vital sign data (MSE-DV) were compared against traditional HRV measures based model and severity of illness scores for predicting outcomes. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), with a primary outcome of mortality at day 7 and secondary outcome of persistent organ dysfunction at day 28. ResultsIn the fluids-plus-vasopressor cohort, MSE-based models demonstrated superior predictive performance for 7-day mortality (AUROC 0.84) compared to severity of illness scores (AUROC 0.64). MSE-DV models also predicted organ dysfunction including 28-day renal (AUROC 0.75), neurological (AUROC 0.79), and respiratory (AUROC 0.71) dysfunction. Patients receiving second-line and third-line vasopressors and corticosteroids exhibited progressively lower MSE values, particularly at mid-range and long-range scales. ConclusionMSE features in the first 24 hours of ICU stay predict mortality and organ dysfunction with higher discrimination than traditional severity of illness scores. Future work should validate these findings, assess longitudinal MSE trends, and race-specific autonomic patterns to refine predictive models.
Born, G.
Show abstract
ObjectiveTo develop and validate a predictive model incorporating behavioral telemetry signals--documentation pattern anomalies derived from routine EHR charting--alongside clinical variables for ICU mortality prediction in patients with low acute physiologic derangement. Materials and MethodsRetrospective cohort study of 46,002 adult ICU stays from MIMIC-IV v3.1 (2008-2022) with SOFA scores 0-2, excluding neurological units. We extracted 66 variables spanning demographics, acuity, behavioral telemetry, clinical enrichment, and temporal factors. Progressive logistic regression models (M1-M7) were compared using cross-validation, DeLong tests, net reclassification improvement, and calibration analysis. ResultsOverall mortality was 9.34% (4,295 deaths). The clinical model (M5) achieved cross-validated AUROC 0.691 versus 0.639 for demographics alone (M2; {Delta}AUROC = 0.052, DeLong p = 4.41x10-47). NRI was 24.3%. Discordant care patients received 30.5% more chart events than concordant patients, with the sole deficit in neurological assessments (-15.4%), refuting the neglect hypothesis. Kaplan-Meier analysis confirmed survival separation (log-rank {chi}2 = 138.6, p = 5.32x10-32). In the most conservative subgroup (SOFA 0, no sedation, no ventilation, N = 11,158), orientation omission remained associated with mortality (adjusted OR 1.52, p = 0.027). DiscussionDeep sedation and mechanical ventilation function as mediators on the causal pathway rather than traditional confounders; the discordant care signal retains significance after full sedation adjustment. ConclusionDocumentation pattern analysis adds measurable predictive value for ICU mortality risk stratification and represents a novel signal for real-time EHR-based clinical decision support.
Gehring, M.
Show abstract
BackgroundPulse oximeters are typically validated on cohorts of 200-500 subjects under controlled conditions. Whether these cohorts capture the demographic heterogeneity of national clinical practice -- and whether measurement error is associated with patient outcomes -- has not been established at scale. MethodsWe analyzed paired SpO2/SaO2 readings from three independent sources spanning 209 U.S. hospitals: MIMIC-IV (1 hospital; 12,934 ICU stays), eICU-CRD (208 hospitals; 55,178 stays), and the Open Oximetry Repository (PhysioNet; 52.4 million readings with continuous melanin and perfusion indices). Bias was defined as SpO2 - SaO2. Hidden hypoxemia (SpO2 [≥] 94% with SaO2 < 88%) was assessed per ICU stay. Mortality was compared between hidden-hypoxemia-positive and -negative stays with multivariable logistic regression adjusting for age, sex, race, and four laboratory severity markers (cluster-robust SEs by hospital). Sensitivity analyses included landmark restriction (first 48 hours), lactate stratification, alternate thresholds, and patient-level aggregation. PPG signal quality was assessed in 125 ICU patients with demographic-linked waveform data. ResultsBias was minimal at normal perfusion but amplified under low perfusion in high-melanin patients, consistent with known optics: at very low perfusion x high melanin x severe hypoxia, mean bias reached +12.8% (n = 458,571), with 47% of readings constituting hidden severe hypoxemia. National bias in African American patients was +2.76% (n = 529,541; 208 hospitals), 62% higher than academic estimates. Across 55,178 eICU stays, hidden hypoxemia was associated with approximately doubled mortality after adjustment for age, sex, race, and illness severity (adjusted OR 1.86, 95% CI 1.69-2.04, p < 0.001), consistent across all racial groups. Hidden hypoxemia was not a pre-terminal phenomenon: 63% of events occurred >48 hours before death (median first event: 15.3 hours; mean time to death: 151 hours), and the association persisted in landmark analysis (first 48 hours only), in patients with normal lactate (adjusted OR 1.87, 95% CI 1.61-2.16), and when both restrictions were applied simultaneously (16.5% vs. 11.1%). Waveform analysis (n = 125) showed no fixed racial difference in baseline PPG AC/DC ratio (Black: 0.299, White: 0.273), suggesting the signal deficit is conditional on perfusion state. Full extraction (n = 1,545) is in progress. ConclusionsIn this multicenter retrospective analysis, national pulse oximetry variance exceeded published benchmarks and was associated with approximately doubled ICU mortality, replicated across 209 U.S. hospitals. Hidden hypoxemia was not a pre-terminal artifact: events occurred throughout the ICU stay at a constant rate, and mortality associations persisted in landmark and lactate-stratified analyses. These findings suggest that current regulatory validation standards may underestimate the real-world prevalence of demographic bias in pulse oximetry, and that perfusion-dependent mechanisms may offer a target for algorithmic correction.
Ellen, J. G.; Hao, S.; Gao, C. A.; Arias, M. D. P.; Viola, M.; Wong, A.-K. I.; Mattie, H.; Parker, W.; Haidau, C.; Matos, J.; Chaves, R. C. d. F.; Celi, L. A.
Show abstract
The Sequential Organ Failure Assessment (SOFA)-2 score was recently validated for ICU mortality prediction across more than 3 million admissions but was not evaluated across demographic subgroups. We assessed the discrimination and calibration of the SOFA-2 score for ICU mortality across subgroups defined by age, sex, race and ethnicity, primary language, and insurance status. We conducted a retrospective cohort study of adult patients (aged 18 years or older) admitted to ICUs at Beth Israel Deaconess Medical Center between 2008 and 2022 (MIMIC-IV, version 3.1), selecting the first ICU admission per patient. First-day SOFA-2 scores (range, 0-24) were calculated using worst recorded values across 6 organ systems. Discrimination was assessed using AUROC, calibration using intercepts and slopes, and subgroup differences using bootstrap resampling. Among 64,015 ICU admissions (median age, 66 years [IQR, 54-78]; 56.1% male; 66.1% White), overall ICU mortality was 7.2% (n=4,596). Overall AUROC was acceptable at 0.77 (95% CI, 0.76-0.77). Notably, discrimination declined significantly with age: AUROC was 0.85 (95% CI, 0.83-0.87) for ages 18-44 and 0.72 (95% CI, 0.70-0.73) for ages 75 and older (difference in AUROC, -0.14; 95% CI, -0.16 to -0.11), with systematic underprediction of mortality in older patients (calibration intercept, 0.39). Discrimination was also significantly lower among non-English speakers (difference in AUROC, -0.04; 95% CI, -0.07 to -0.01) but did not differ significantly across documented racial and ethnic groups. Patients with unknown race/ethnicity (14.3% of the cohort) had nearly double the overall mortality rate and poor calibration. SOFA-2 demonstrated good overall performance for ICU mortality prediction but with clinically meaningful variation across demographic subgroups, particularly a substantial decline in discrimination with advancing age. These findings underscore the need for routine equity evaluation of clinical prediction tools before widespread implementation.
Wiseman, J.; Sibley, S.; Perez-Patrigeon, S.; Mekhaeil, M.; Hanley, M.; Hunt, M.; Boyd, T.; Grant, B.; Boyd, J. G.
Show abstract
IntroductionThere is increasing interest in the peripheral administration of vasopressors for two main reasons: (1) to expedite vasopressor initiation in patients with refractory shock and (2) to avoid the potential complications associated with central venous catheter placement. The current evidence on the use of peripheral vasopressor administration is primarily based on single-center observational studies. There are inconsistencies in the administration of peripheral vasopressors, including catheter gauge and location, monitoring practices, vasopressor concentrations, and duration of use. This has made it difficult for institutions to develop best practice guidelines. A randomized controlled trial is needed to address this knowledge gap. Methods and analysisThe Peripheral Use of Low-dose Vasopressors for Safety and Efficacy (PULSE) in the intensive care unit is a prospective, unblinded feasibility study. Eligible patients will be 18 years or older, have no existing central venous catheter or peripherally inserted central catheter and have the presence of shock requiring a minimum vasopressor dose of any of the following: norepinephrine 0.0625 mcg/kg/min, phenylephrine 0.625 mcg/kg/min, and epinephrine 0.0625 mcg/kg/min. Fifty patients will be randomized 1:1 into either the peripheral venous catheter or central venous catheter group. The primary outcome is feasibility, defined as (1) a recruitment rate of 4 participants per month, (2) a data capture rate of [≥]90%, and (3) a <50% conversion rate from peripheral to central access. The secondary outcomes include the safety of peripheral vasopressor use, alive and central-line-free days, the number of attempts needed to place a catheter, volume status, in-hospital mortality rate, ICU and hospital length of stay, and patient-centred important outcomes. ImplicationsThe data collected from this study will inform the design of a definitive randomized controlled trial to assess the safety and efficacy of protocol-driven peripheral vasopressor administration. Ethics and disseminationThis study received approval (6042888) from the Queens University Health Sciences/Affiliated Teaching Hospitals Research Ethics Boards. Results of this study will be presented at critical care conferences and submitted for publication. Trial registration numberNCT06920173 (https://clinicaltrials.gov/study/NCT06920173).
Coupland, L. A.; Frost, S. A.; Lin, J.; Pham, N.; Suryana, E.; Self, M.; Chia, J.; Lam, T.; Liu, Z.; Jaich, R.; Crispin, P.; Rabbolini, D.; Law, R.; Keragala, C.; Medcalf, R.; Aneman, A.
Show abstract
Rationale: Fibrinolysis resistance in sepsis associates with thrombotic burden, multi-organ failure and death. The degrees and dynamics of resistance that associate with mortality in acute sepsis are unknown, and a simple tool to aid clinician interpretation of fibrinolysis measurements is lacking. Objectives: To establish a point of care grading tool of fibrinolysis resistance that aligns with scoring systems for disease acuity, is substantiated by plasma fibrinolysis markers and enables rapid investigation of the fibrinolysis state at the point of care. Methods: Prospective observational study of 116 adult sepsis/septic shock patients with sequential measurements of fibrinolysis resistance during Intensive Care Unit (ICU) admission using tissue plasminogen activator (tPA) enhanced viscoelastic testing (VET). The clot lysis time (TPA-LT) adjusted for fibrin clot amplitude (TPA-LT/FIBA10, sec/mm) underwent cluster analysis and was evaluated against disease severity scores, standard pathology, clinical outcomes and fibrinolysis markers. Measurements and Main Results: Three clusters of progressively increasing fibrinolysis resistance were identified (Grades 1-3). At admission, Grade 3 associated with the highest disease severity, organ failure, haematological and biochemical perturbations, fibrinolysis marker inhibitory profile and mortality (42% versus 24% and 15% in Grade 2 and Grade 1, respectively) with a 3.9-fold [95% CI 1.4-11] increased hazard ratio for death at 28 days compared to Grade 1. Transitions between grades were frequent over 7 days with a reduced Grade associated with decreased risk of death. Conclusions: Grading of fibrinolysis resistance in sepsis enables rapid identification of patients at greatest mortality risk with any dynamic improvement corresponding to favourable clinical outcomes.
Meza-Fuentes, G.; Delgado, I.; Barbe, M.; Sanchez-Barraza, I.; Filippini, D.; Smit, M. R.; Sinnige, J. S.; Kramer, L.; Smit, J.; Jonkman, A.; Meade, M.; Retamal, M. A.; Lopez, R.; Bos, L. D. J.
Show abstract
Background Acute respiratory distress syndrome (ARDS) is characterised by substantial physiological heterogeneity, which contribute to a very variable clinical outcomes and therefore inconsistent responses to ventilatory strategies. We aimed to externally validate physiological ARDS subphenotypes previously identified using routine ventilatory and gas-exchange variables, assess their prognostic relevance across independent cohorts, and examine heterogeneity of treatment effect according to PEEP strategy. Methods Unsupervised Gaussian Mixture Modelling was used to identify physiological subphenotypes based on ventilatory mechanics and gas-exchange parameters. Labels were subsequently used to train and validate supervised classifiers using XGBoost. Prognostic relevance was assessed across three independent cohorts, including two randomised controlled trials (ALVEOLI and LOVS). Predictive enrichment for PEEP strategy was evaluated using individual patient data from ALVEOLI and LOVS (n = 1,532) using intention-to-treat analyses, applying both one-stage and two-stage fixed-effects IPD meta-analytic approaches to test for interaction between physiological subphenotype and PEEP strategy. Results Two distinct physiological subphenotypes, termed Efficient and Restrictive, were replicated across independent cohorts. Across each cohort, patients classified as Restrictive consistently exhibited higher all-cause 28-day mortality compared to Efficient patients. When pooled across studies, the Restrictive subphenotype was associated with a significantly increased risk of death (pooled odds ratio 1.75, 95% CI 1.36-2.24), with no evidence of between-study heterogeneity. Predictive analyses showed a statistically significant interaction between physiological subphenotype and PEEP strategy in the one-stage IPD model (p for interaction = 0.037), with concordant findings in the two-stage fixed-effects IPD meta-analysis (interaction OR 1.91, 95% CI 1.00-3.66; I2 = 0%). Higher PEEP was associated with increased mortality in Efficient patients and reduced mortality in Restrictive patients, indicating effect modification by physiological subphenotype. Interpretation Physiological ARDS subphenotypes derived from routinely collected bedside data provide robust and externally validated prognostic stratification across observational and randomised trial cohorts. The observed interaction with PEEP strategy suggests that underlying physiological profiles may influence treatment response, supporting the concept that physiology-based be a starting point for personalized medicine and therefore better ventilatory strategies in future clinical trials.
Basilakis, A.
Show abstract
Background: Patient matching in intensive care databases yields sample sizes too small for individualised outcome analysis. Current AI systems provide population-level guideline summaries but omit stratification variables that may invert therapy signals at the individual level. Methods: We developed the Therapeutic Distance framework, which computes the z-standardised distance between a patient's clinical parameters and the centroid of MIMIC-IV patients who received each therapy: d(P,T) = sum of wi(T) x |(Li - mui(T)) / sigmai|. We hypothesise that patients at the same distance to a therapy (same orbit) have comparable outcomes. Six validation experiments were performed on 11,627 sepsis patients (SAPS-II 30-80) from MIMIC-IV v3.1. Results: Echo-stratified vasopressin recipients showed mortality of 30.1% (n=146, 95% CI 22.6-37.7%) versus 53.9% without echo (n=2,426, 95% CI 51.9-55.9%). Confidence intervals did not overlap (bootstrap, 1,000 resamples). However, echo-stratified patients had lower general severity (SAPS-II 49.2 vs 53.9) but higher cardiac biomarkers (troponin 1.0 vs 0.51 ng/mL), indicating that the observed difference is compatible with both severity confounding and a possible cardiac-specific vasopressin effect. Leave-one-out prediction with uniform weights achieved AUC 0.61 as a structural baseline. Conclusions: Therapeutic Distance replaces patient matching with orbit matching, substantially increasing usable sample sizes. The echo-vasopressin finding is hypothesis-generating and mechanistically plausible but not causally proven. The framework is intended as a clinical decision support signal under uncertainty, not as a causal inference method.
Born, G.
Show abstract
BackgroundQuality measurement in intensive care emphasizes task completion--whether assessments were documented and protocols followed. Electronic health record (EHR) systems capture these signals in real time, yet current metrics cannot distinguish task completion from cognitive clinical engagement. A prior analysis demonstrated that omission of orientation assessment predicted a 4.29-fold increase in hospital mortality among low-acuity ICU patients [1]. Whether combining this marker with routine task-completion data yields a computable phenotype with independent prognostic value has not been studied. ObjectiveTo define, validate, and characterize "discordant care"--a computable EHR phenotype defined as completion of [≥]6 of 8 routine nursing assessments without orientation assessment documentation--as a predictor of hospital mortality, distinguishing patient-level confounding from care process signal. MethodsRetrospective cohort study using MIMIC-IV v3.1 (2008-2022), including 46,004 adult ICU stays with SOFA scores 0-2 and length of stay [≥]24 hours in non-neurological ICUs. Primary exposure: discordant care, computed from structured nursing flowsheet data within 24 hours of admission. Primary outcome: hospital mortality. Progressive covariate adjustment included mechanical ventilation, sedation, and diagnosis. ResultsDiscordant care was present in 8891 patients (19.3%), with 69.7% mechanically ventilated versus 25.3% of concordant patients. Two overlapping signals were identified: a patient-level signal driven by ventilation/sedation (full adjustment OR 1.19, 95% CI 1.09-1.30) and a care process signal in non-ventilated patients (OR 2.14, 1.87-2.44; N=30,314). Among non-ventilated SOFA 0 patients, OR was 2.60 (2.13-3.18; N=16,295). The signal was present across all 7 major diagnosis categories. Quantitative bias analysis indicated unmeasured delirium could attenuate but likely not fully explain the non-ventilated signal. ConclusionsDiscordant care identifies two phenomena: a patient-level signal from ventilation/sedation and a care process signal where assessable patients receive routine care without cognitive engagement (OR 2.14-2.60). This care process signal is invisible to existing quality metrics and detectable in real time. Prospective validation with systematic delirium screening is needed.
Yawata, S.; Uchino, S.; Yamashima, S.; Nishiyama, S.; Ono, S.; Sasabuchi, Y.; Katayama, S.
Show abstract
BackgroundThe role of arterial blood gas (ABG) testing in the intensive care unit (ICU) remains debated within the "less is more" paradigm. While unnecessary testing may pose risks without benefit, timely ABGs provide critical information in unstable patients. Institutional variation in early ABG utilization and its association with outcomes remains unclear. MethodsWe conducted a multicenter retrospective cohort study using the Japanese Intensive Care PAtient Database (JIPAD) between April 2015 and March 2023. Adult ICU patients with a stay [≥]24 h and arterial line placement were included. The standardized number of ABGs (SNABGs) within the first 24 h was calculated as the ratio of observed to expected values, where expectations were derived from a multivariable model adjusting for patient covariates. ICUs were categorized into tertiles according to SNABG utilization. The primary outcome was in-hospital mortality, analyzed using multilevel logistic regression with ICU-level random intercepts. Restricted cubic splines were used to explore non-linear associations. ResultsAmong 117,546 patients from 87 ICUs, the mean number of ABGs varied widely. After standardization, SNABGs ranged from 0.73-0.90 in the low tertile to 1.09-1.15 in the high tertile. In the multilevel model, SNABG was not significantly associated with in-hospital mortality (adjusted OR 0.942 [95% CI 0.807-1.100] for tertile 2; 0.874 [95% CI 0.751-1.017] for tertile 3). Flexible modeling suggested a non-linear trend toward better outcomes with higher utilization, but confidence intervals included unity. ConclusionEarly ABG utilization varied across ICUs, yet was not significantly associated with mortality. Sensitivity analysis suggested a non-linear relationship, with a tendency toward better outcomes at higher utilization. These findings warrant further investigation to clarify the role of early ABG utilization in critical care.
Carioca, F. D. L.; Franzon, N. H.; Krzesinski, L. d. S.; Ferraz, I. d. S.; Nogueira, R. J. N.; De Souza, T. H.
Show abstract
ObjectivesTo develop and validate pediatric adaptations of the Venous Excess Ultrasound Score (P-VExUS) for noninvasive estimation of central venous pressure (CVP) in critically ill children. DesignProspective observational study. SettingPICU of a tertiary-care teaching hospital. PatientsFifty-six mechanically ventilated children (median age 7.4 months, median weight 6.0 kg) with central venous catheters. InterventionsNone. Measurements and Main ResultsVenous Doppler ultrasonography of the inferior vena cava, hepatic, portal, and intrarenal veins was performed at the bedside. Two P-VExUS models were tested: (1) a categorical grading system (0-III) and (2) a semiquantitative point-based score (0-7). Both models showed significant associations with CVP. For predicting elevated CVP (>12 mmHg), model 1 achieved an AUROC of 0.74 (95% CI 0.61-0.85) with 45% sensitivity and 98% specificity, while model 2 demonstrated superior accuracy with an AUROC of 0.94 (95% CI 0.84-0.98), sensitivity 82%, and specificity 91% (p < 0.001). For detecting low CVP (<7 mmHg), model 2 also outperformed model 1 (AUROC 0.80 vs. 0.69, p = 0.02). Among individual venous Doppler components, intrarenal veins had the highest discriminative ability (AUROC 0.92), followed by hepatic (0.89) and portal (0.80) veins. ConclusionsTwo pediatric-specific P-VExUS models were feasible and accurate for estimating CVP in critically ill children. The point-based model (model 2) demonstrated superior diagnostic performance, supporting its potential as a noninvasive tool to assess venous congestion at the bedside. Research in ContextO_LIVenous congestion, reflected by elevated central venous pressure (CVP), is associated with adverse outcomes in critically ill children, including mortality and renal dysfunction. C_LIO_LIThe Venous Excess Ultrasound Score (VExUS) is validated in adults, but pediatric-specific adaptations and cutoff values remain poorly defined. C_LIO_LIThere is a need for noninvasive, bedside tools to estimate CVP in children and guide fluid management in the PICU. C_LI What This Study MeansO_LIThis study validates pediatric-specific adaptations of the Venous Excess Ultrasound Score (P-VExUS) for estimating CVP in critically ill children. C_LIO_LIThe semiquantitative point-based model provided more consistent and accurate discrimination of venous congestion compared with categorical grading. C_LIO_LIThese findings highlight the feasibility and potential clinical utility of venous Doppler ultrasonography as a noninvasive bedside tool in the PICU. C_LI
Armenta Salas, M.; Zhang, A.; Girard, T. D.; Devlin, J. W.; Barr, J.
Show abstract
BACKGROUNDDelirium is common in critically ill adults but often goes unrecognized and undertreated. Little is known about the perceptions of ICU nurse and physician leaders regarding ICU delirium detection and management and the potential role of objective continuous delirium monitoring to facilitate ICU delirium care. RESEARCH QUESTIONWhat are the perceptions of ICU leaders regarding the current challenges associated with delirium recognition and management and the potential benefits of continuous delirium monitoring? STUDY DESIGN AND METHODSWe conducted a blinded, cross-sectional, electronic survey of ICU leaders across the U.S., including physician directors and nursing managers with [≥]3 years of ICU leadership experience. We asked about perceptions of the effectiveness of current delirium clinical assessment tools, current delirium detection and management challenges, and how an objective, continuous delirium monitoring system might impact clinician practice and patient outcomes in their ICU. RESULTSAmong the 81 respondents (62 physicians, 19 nurses), most (76%) reported that recommended delirium assessment tools (CAM-ICU, ICDSC) are used in their ICUs, though there were mixed perceptions on how reliably they are conducted. A majority (63-90%) perceived that current bedside assessments delay and limit the recognition of ICU delirium. Nearly all (89%) agreed an objective delirium monitoring tool would be more clinically valuable than current delirium assessment tools and that it would support real-time, delirium management by clinicians. CONCLUSIONSICU leaders perceive that there are limitations to using clinical delirium assessment tools in ICU patients to effectively detect and manage ICU delirium. Most felt that an objective delirium monitor could facilitate delirium detection and potentially expedite appropriate delirium management in patients.
Tjepkema-Cloostermans, M. C.; Beishuizen, A.; Strang, A. C.; Keijzer, H. M.; Telleman, J. A.; Smook, S. P.; Vermeijden, J. W.; Hofmeijer, J.; van Putten, M. J. A. M.
Show abstract
ObjectiveDespite substantial variability in the severity of post-anoxic encephalopathy, all comatose patients after cardiac arrest are usually treated according to the same standardized intensive care protocol, including sedation, mechanical ventilation, and targeted temperature management (TTM). We hypothesize that patients with a favourable EEG pattern (continuous EEG within 12 hours after cardiac arrest) may not benefit from prolonged sedation and TTM. We studied the feasibility and safety of early cessation of sedation and TTM in this subgroup. MethodsWe conducted a non-randomized, controlled intervention study including 40 adult patients admitted to the ICU with postanoxic encephalopathy after cardiac arrest and an early (< 12 hours) favourable EEG pattern. The control group received standard care with sedation and TTM for at least 24-48 hours, whereas the intervention group underwent early cessation of sedation and TTM as soon as possible after establishing a favourable EEG, followed by weaning from mechanical ventilation. The primary outcome was duration of mechanical ventilation. Secondary outcomes included ICU length of stay, total sedation time, number of ICU complications, and neurological outcomes at 3 and 6 months. ResultsDuration of mechanical ventilation was significantly shorter in the intervention than in the control group (median 12 vs 28 h, p < 0.001). Median ICU length of stay and median total sedation time were also reduced by more than 50% in the intervention group, from respectively 2.5 to 1.2 days (p = 0.001) and 27 to 12 h (p < 0.001). There was no increase in ICU complications in the intervention group. No statistically significant differences in neurological outcomes at 3 or 6 months were observed. ConclusionEarly withdrawal of sedation is feasible and safe in patients with an early favourable EEG following cardiac arrest. The study was underpowered to detect possible differences in long-term neurological recovery. SignificanceShortening sedation and mechanical ventilation is likely to result in direct reductions in healthcare costs and contribute to more appropriate care. Larger studies are needed to evaluate the impact on long-term neurological outcomes.
Gjertsen, M.; Yoon, W.; Afshar, M.; Temte, B.; Leding, B.; Halliday, S.; Bradley, K.; Kim, J.; Mitchell, J.; Sanders, A. K.; Croxford, E. L.; Caskey, J.; Churpek, M. M.; Mayampurath, A.; Gao, Y.; Miller, T.; Kruser, J. M.
Show abstract
Importance: Physicians routinely prognosticate to guide care delivery and shared decision making, particularly when caring for patients with critical illnesses. Yet, these physician estimates are prone to inaccuracy and uncertainty. Artificial intelligence, including large language models (LLMs), show promise in supporting or improving this prognostication. However, the performance of contemporary LLMs in prognosticating for the heterogeneous population of critically ill patients remains poorly understood. Objective: To characterize and compare the performance of LLMs and physicians when predicting 6-month mortality for hospitalized adults who survived critical illness. Design: Embedded mixed methods study with elicitation and comparison of prognostic estimates and reasoning from LLMs and practicing physicians. Setting: The publicly available, deidentified Medical Information Mart for Intensive Care (MIMIC)-IV v2.2 dataset. Participants: We randomly selected 100 hospitalizations of adult survivors of critical illness. Four contemporary LLMs (Open AI GPT-4o, o3- and o4-mini, and DeepSeek-R1) and 7 physicians provided independent prognostic estimates for each case (1,100 total estimates; 400 LLM and 700 physician). Main outcomes and measures: For each case, LLMs and physicians used the hospital discharge summary and demographics to predict 6-month mortality (yes/no) and provide their reasoning (free text). We assessed prognostic performance using accuracy, sensitivity, and specificity, and used inductive, qualitative content analysis to characterize reasonings. Results: Mean physician accuracy for predicting mortality was 70.1% (95% CI 63.7-76.4%), with sensitivity of 59.7% (95% CI 50.6-68.8%) and specificity of 80.6% (95% CI 71.7-88.2%). The top-performing LLM (OpenAI o4-mini) accuracy was 78.0% (95% CI 70.0-86.0%), with sensitivity of 80.0% (95% CI 67.4-90.2%) and specificity of 76.0% (95% CI 63.3-88.0%). The difference between mean physician and top-performing LLM accuracy was not statistically significant (p = 0.5). Qualitative analysis revealed similar patterns in LLM and physician expressed reasoning, except that physicians regularly and explicitly reported uncertainty while LLMs did not. Conclusion and Relevance: In this study, LLMs and physicians achieved comparable, moderate performance in predicting 6-month mortality after critical illness, with similar patterns in expressed reasoning. Our findings suggest LLMs could be used to support prognostication in clinical practice but also raise safety concerns due to the lack of LLM uncertainty expression.
Sines, B. J.; Hagan, R. S.; Jiang, X.; Pavlechko, E.; McClain, S.; Hunt, X.; Florou-Moreno, J.; Acquardo, J.; Risa, G.; Valsaraj, V.; Schisler, J. C.; Wolfgang, M. C.
Show abstract
Objective: To develop a workflow that transforms electronic health record data into machine learning-ready features for molecular endotype assignment and to evaluate whether clinician-informed feature engineering improves model performance and interpretability. Materials and Methods: We developed parallel clinician-informed and clinician-agnostic feature engineering pipelines to prepare raw EHR data from mechanically ventilated patients with respiratory failure. Molecular endotype labels derived from paired deep lung and blood profiling of subjects with acute lung injury were used to train candidate machine learning classifiers. Champion models from each pipeline were compared on predefined performance metrics. Results: Bayesian network classifiers were the top-performing models in both pipelines. The clinician-informed pipeline generated fewer features than the clinician-agnostic pipeline (645 vs 1,127) and produced a lower misclassification rate in the final Bayesian network model (0.047 vs 0.14). In an independent cohort of subjects with acute lung injury, the clinician-informed model better distinguished corticosteroid-responsive from non-responsive subgroups. Discussion: Clinical context improved feature engineering efficiency, model interpretability, and classification performance. These findings support the integration of domain expertise into machine learning workflows intended for critical care implementation. Conclusions: Clinician-informed feature engineering can simplify machine learning models while improving performance and preserving clinical relevance. AI tools developed for healthcare should incorporate subject matter expertise early in the feature engineering and analytic workflow.
Berg, N. K.; Kerchberger, V. E.; Pershad, Y.; Corty, R. W.; Bick, A. G.; Ware, L. B.
Show abstract
Rationale: Sepsis is a life-threatening syndrome causing significant morbidity and mortality especially in the aging population. Clonal hematopoiesis of indeterminate potential (CHIP) is an age-related condition of clonal expansion of hematopoietic stem cells harboring somatic mutations associated with increased incidence of chronic illness and all-cause mortality. Objective: Evaluate the association of pre-illness CHIP with mortality and morbidity in patients admitted to the ICU with sepsis. Methods: We performed a retrospective study using a de-identified electronic health record linked with a DNA biorepository. We identified adult patients with sepsis who had DNA collected prior to ICU admission. We tested the association between CHIP status, determined from whole-genome sequencing, and ICU mortality, organ support-free days, and long-term survival adjusting for age, sex, race and Sequential Organ Failure Assessment (SOFA) score on ICU admission. Measurements and Main Results: Pre-illness CHIP was associated with increased sepsis mortality (OR = 1.54, 95% CI 1.13 to 2.07, P = 0.005) and fewer days alive and free of organ support (-1.7 days, 95% CI -3.2 to -0.2, P = 0.028) after adjusting for age, sex, race, and SOFA score. In sepsis survivors, CHIP was also associated with increased long-term mortality after discharge (HR 1.40, 95% CI 1.01 to 1.93, P = 0.041). Conclusions: Pre-illness CHIP was independently associated with increased mortality and morbidity in critically-ill adults with sepsis. These findings suggest that CHIP is a risk factor for sepsis severity. Elucidating the mechanism underlying this association could uncover new therapeutic interventions for sepsis.
Peltekian, A. K.; Liao, W.-T.; Guggilla, V.; Markov, N. S.; Senkow, K.; Liao, Z.; Kang, M.; Rasmussen, L. V.; Tavernier, E.; Ehrmann, S.; Clepp, R. K.; Stoeger, T.; Walunas, T.; Choudhary, A. N.; Misharin, A. V.; Singer, B. D.; Budinger, G. S.; Wunderink, R. G.; Gao, C. A.; Agrawal, A.
Show abstract
PurposeVentilator-associated pneumonia (VAP) remains one of the most serious hospital-acquired infections in the intensive care unit (ICU), with high morbidity and mortality. Early identification of patients at risk for developing VAP could enable timely diagnostics and intervention. However, current clinical tools are limited in their ability to detect early physiologic signals preceding VAP onset. We aimed to build supervised machine learning models to predict short term onset of VAP. MethodsWe analyzed electronic health record data from a prospective observational cohort of ICU patients, where VAP was adjudicated using a standardized published protocol by a panel of critical care physicians. Clinical features (including vital signs, ventilator settings, laboratory values, and support devices) were extracted for each patient-ICU-day. We explored unsupervised clustering to characterize feature dynamics associated with VAP onset. We built multiple machine learning models across different prediction windows (3, 5, 7 days before VAP). We examined model performance in two external cohorts, MIMIC-IV and secondary analysis of the AMIKINHAL trial. Results were evaluated with discrimination metrics such as AUROC. ResultsThe internal cohort included 507 patients with BAL-confirmed diagnoses: 261 developed VAP and 246 did not have VAP. Visualization using clustering identified distinct physiologic states enriched for VAP-labeled days. The best-performing model achieved an AUROC of 0.866 in predicting VAP up to seven days before clinical diagnosis. Temporal model probability trajectories showed rising model confidence in the days leading up to VAP. On external validation in MIMIC-IV, the best model achieved an AUROC of 0.817 for forecasting VAP within five days. There was low feature overlap with the AMIKINHAL trial data, leading to poor model performance. Feature analysis revealed that platelet count, positive end-expiratory pressure (PEEP), ventilator duration, and inflammatory markers were key drivers of model predictions. ConclusionsMachine learning models trained on routinely collected ICU data with careful labeling can anticipate VAP onset up to a week in advance with strong predictive performance. Model performance generalized to data from an entirely different hospital system despite differences in practice and labeling patterns, but did not perform well when there was poor feature overlap. Future work should focus on real-time prospective evaluation.